Diffuse large B-cell lymphoma (DLBCL) is a complex and aggressive malignancy. The standard-of-care chemo-immunotherapeutic regimen, which consists of R-CHOP (rituximab, cyclophosphamide, doxorubicin, vincristine, and prednisone), leads to a complete response in most patients. Unfortunately, 30-40% of patients are either refractory to the current treatment regimen or experience disease relapse after complete response, and thus, these patients exhibit a dismal prognosis. Gene expression profiling has delineated two distinct molecular subtypes of DLBCL, the germinal center B-cell-like (GCB) subtype and the activated B-cell-like (ABC) subtype; 10 to 15% of cases are unclassifiable, these groups differ in survival, and could potentially direct therapy (Alizadeh, Nature 2000). A study by Schmitz et al have used whole exome and transcriptome of 574 DLBCL tumors to refine the abovementioned genetic subtypes, in attempt to improve prognostic capacity, focusing on protein coding genes (Schmitz, N Engl J Med. 2018) . Non-coding RNAs (ncRNAs) has been shown to be differentially expressed and clustered across different groups of DLBCL samples, suggesting that these ncRNAs may also participate in DLBCL pathogenesis (Shi, OncoTargets and therapy 2020) .

In this study, we applied machine learning methods for classification of DLBCL genetic subgroups based on ncRNA expression, and proposing a clinical-genetic survival predictive model.

Out of 1866 ncRNAs from the study by Schmitz et al, 377 were selected using information gain algorithm, and were used to classify 234 DLBCL tumors to the different genetic subgroups (ABC, GCB, Unclassified). Classification models were trained using K Nearest Neighbor (KNN) (K=5), decision tree, random forest and multilayer perceptron algorithms leading to a weighted area under the ROC curve of 0.895, 0.749, 0.924 and 0.965 respectively.

Using the information gain algorithm, we identified 28 ncRNAs which have an information gain score of >0 in classifying patients to either having achieved survival of three years or not. Of these, seven ncRNAs were found to have a significant correlation to overall survival (OS) (p<0.05 for each) using cox regression survival analysis. In multivariate analysis, including age, gender, ECOG, IPI, genetic subgroups and these seven ncRNAs, we found only age and three ncRNAs (NR_026893, NR_002939, NR_002186) to be significantly associated with OS, figure 1A. We performed Kaplan Mayer analysis using these three genes as binary variables (medians were used for cutoff), dividing the cohort into three groups (all three ncRNA up regulated, one/two down regulated and all three ncRNAs down regulated) with robust difference in overall survival (median OS was not reached, 5.5 CI 95% (1.1-9.9) years 1.5 CI 95% (0.4-2.7) years, respectively), as presented in figure 1B.

In conclusion, we detected novel diagnostic and prognostic ncRNAs biomarkers which potentially be able to inform clinical management for patients with DLBCL. Further studying of ncRNAs expression profile and cellular mechanisms could help improve our understanding of the disease and potentially identify new therapeutic targets and support the development of new therapies.

Disclosures

Avivi:Kite, a Gilead Company: Speakers Bureau; Novartis: Speakers Bureau. Cohen:Karophram: Membership on an entity's Board of Directors or advisory committees, Research Funding; GSK: Consultancy, Membership on an entity's Board of Directors or advisory committees; Amgen: Membership on an entity's Board of Directors or advisory committees, Research Funding; Janssen: Membership on an entity's Board of Directors or advisory committees; Takeda: Membership on an entity's Board of Directors or advisory committees, Research Funding; Neopharm / promedico: Consultancy, Membership on an entity's Board of Directors or advisory committees, Research Funding.

Sign in via your Institution